Creating annotated resources for polarity classification in Czech
نویسندگان
چکیده
This paper presents the first steps towards reliable polarity classification based on Czech data. We describe a method for annotating Czech evaluative structures and build a standard unigram-based Naive Bayes classifier on three different types of annotated texts. Furthermore, we analyze existing results for both manual and automatic annotation, some of which are promising and close to the state-of-the-art performance, see Cui (2006).
منابع مشابه
On the Linguistic Structure of Emotional Meaning in Czech
This thesis has two main goals. First, we provide an analysis of language means which together form an emotional meaning of written utterances in Czech. Second, we employ the findings concerning emotional language in computational applications. We provide a systematic overview of lexical, morphosyntactic, semantic and pragmatic aspects of emotional meaning in Czech utterances. Also, we propose ...
متن کاملCzech Subjectivity Lexicon: A Lexical Resource for Czech Polarity Classification
This paper introduces Czech subjectivity lexicon – the new lexical resource for sentiment analysis in Czech. The lexicon is a dictionary of 4947 evaluative items annotated with part of speech and tagged with positive or negative polarity. We describe the method for building the basic vocabulary and the criteria for its manual refinement. Also, we suggest possible enrichment of the fundamental l...
متن کاملAnnotate-Sample-Average (ASA): A New Distant Supervision Approach for Twitter Sentiment Analysis
The classification of tweets into polarity classes is a popular task in sentiment analysis. State-of-the-art solutions to this problem are based on supervised machine learning models trained from manually annotated examples. A drawback of these approaches is the high cost involved in data annotation. Two freely available resources that can be exploited to solve the problem are: 1) large amounts...
متن کاملRA-SR: Using a ranking algorithm to automatically building resources for subjectivity analysis over annotated corpora
In this paper we propose a method that uses corpora where phrases are annotated as Positive, Negative, Objective and Neutral, to achieve new sentiment resources involving words dictionaries with their associated polarity. Our method was created to build sentiment words inventories based on sentisemantic evidences obtained after exploring text with annotated sentiment polarity information. Throu...
متن کاملPrague Czech-English Dependency Treebank. Syntactically Annotated Resources for Machine Translation
This paper introduces the Prague Czech-English Dependency Treebank (PCEDT), a new Czech-English parallel resource suitable for experiments in structural machine translation. We describe the process of building the core parts of the resources – a bilingual syntactically annotated corpus and translation dictionaries. A part of the Penn Treebank has been translated into Czech, the dependency annot...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012